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Abstract 

Background: Clirysantliemyl dipliospliate syntliase (CDS) is a l<ey enzyme in biosyntlietic patliways producing 
pyretlirins and irregular nnonoterpenes. Tliese compounds are confined to plants of the tribe Anthemideae of the 
Asteraceae, and play an important role in defending the plants against herbivorous insects. It has been proposed 
that the CDS genes arose from duplication of the farnesyl diphosphate synthase (FDS) gene and have different 
function from FDSs. However, the duplication time toward the origin of CDS and the evolutionary force behind the 
functional divergence of the CDS gene are still unknown. 

Results: Two duplication events were detected in the evolutionary history of the FDS gene family in the 
Asteraceae, and the second duplication led to the origin of CDS. CDS occurred after the divergence of the tribe 
Mutisieae from other tribes of Asteraceae but before the birth of the Anthemideae tribe. After its origin, CDS 
accumulated four mutations in sites homologous to the substrate-binding and catalysis sites of FDS. Of these, two 
sites were involved in the binding of the nucleophilic substrate isopentenyl diphosphate in FDS. Maximum 
likelihood analyses showed that some sites in CDS were under positive selection and were scattered throughout 
primary sequences, whereas in the three-dimensional structure model they clustered in the large central cavity. 

Conclusion: Positive selection associated with gene duplication played a major role in the evolution of CDS. 



Background 

Chrysanthemyl diphosphate synthase (CDS) catalyzes 
the condensation of two molecules of dimethylallyl 
diphosphate (DMAPP) to form chrysanthemyl diphos- 
phate and is a key enzyme in biosynthetic pathways in- 
volving the production of pyrethrins and irregular 
monoterpenes [1-4]. Irregular monoterpenes are much 
less common than other isoprenoids and are confined to 
plants of the tribe Anthemideae in the family Asteraceae 
[1,2,5]. These secondary metabolites play an important 
role in their defense against herbivorous insects [1,2,4,6-8]. 

Many enzymes involved in secondary plant metabol- 
ism are encoded by gene families that originated through 
gene duplications [9-11]. Evidence shows that the CDS 
genes resulted from duplication of the farnesyl diphos- 
phate synthase (FDS) genes that belong to a small family 
[2,12-14], with copy numbers ranging from one in grape 
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and two in Arabidopsis thaliana, to five in rice [15]. In 
Artemisia tridentata (Asteraceae-Anthemideae), three 
FDS genes have been identified: FDSl, FDS2 and CDS 
(also known as FDSS) [2]. Despite the high sequence 
similarity of the three genes, divergent functions have 
been found among them [1,2]. FDSl and FDS2 catalyze 
the sequential head-to-tail condensation of two mole- 
cules of isopentenyl diphosphate (IPP) with DMAPP to 
produce farnesyl diphosphate (FPP), and are involved in 
the biosynthesis of regular sesquiterpenes [2]; whereas 
CDS catalyzes two molecules of DMAPP to form irregu- 
lar monoterpenes [1,2]. Moreover, FDS and its products 
are found in organisms ranging from prokaryotes to 
eukaryotes [16,17], while the products of CDS are only 
present in Anthemideae [1,2,5]. Thus, FDS seems to be 
an ancient gene of which CDS is a derived or modified 
orthologous copy, and irregular monoterpenes might be 
products of a pathway arising from those of other 
isoprenoids. 
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Gene duplication is prevalent in plant genomes and 
the duplicated genes face different evolutionary fates, in- 
cluding pseudogenization (nonfunctionalization), reten- 
tion of the original function, subfunctionalization, and 
neofunctionalization under the functional view [18-25]. 
The duplication origin and distinct function of CDS raise 
a number of interesting questions. First, when did the 
duplication event leading to the origin of CDS take 
place? Because the products of CDS occur exclusively in 
Anthemideae [1,2,5], it would be expected that the CDS 
gene occurred at the same time as the origin of this 
tribe. To date, CDS sequences have only been cloned 
from two species of Anthemideae {Pyrethrum cinerarii- 
folium and A, tridentata) [1,2] and no information is 
available as to whether CDS occurs in any other mem- 
bers of the Anthemideae and the related relatives. Sec- 
ond, given the fact that CDS uses a new nucleophilic 
substrate to generate new products, did CDS accumulate 
mutations in sites homologous to the substrate-binding 
and catalysis sites of FDS? FDS contains five conserved 
regions, two of which are DDXXD motifs [1,2,26]. At 
the three-dimensional structure level, conserved amino- 
acids in the five regions and the C-terminal of FDS are 
located in a large central cavity surrounded by 10 a-heli- 
ces, which have been identified as substrate-binding and 
catalysis sites [27-31]. Previous studies demonstrated 
that CDS has a T ^ G substitution in region IV and a D 
N substitution in the first aspartate in region V [1,2]. 

Finally, what was the evolutionary force leading to the 
functional divergence of CDSl It is controversial whether 
the functional divergence of duplicate genes arises from 
the relaxation of selective constraints or positive selec- 
tion [19,23,32,33]. Generally, neutral evolution with 
relaxed selective constraints is treated as the null hy- 
pothesis [19,34] and positive selection is invoked if the 
null hypothesis is rejected. Positive selection has been 
detected in many genes after their duplication [33,35]. It 
has been proposed that positive selection promotes the 
functional divergence of gene family members encoding 
enzymes involved in secondary metabolism [36,37]. 
Because CDS carries out the production of irregular 
monoterpenes that are important secondary metabolites 
for defense against herbivorous insects in Anthemideae, 
we investigated whether the functional divergence of 
CDS was driven by positive selection. 

In this study, we reconstructed the phylogeny of the 
FDS gene family based on cDNA and EST sequences 
from the main Asteraceae lineages. We detected two 
rounds of gene duplication during the evolution of the 
Asteraceae FDS gene family, and inferred the possible 
time of origin of the CDS gene. Homology modeling 
and molecular evolutionary analyses showed that two 
mutations in CDS might be responsible for the fact 
that CDS does not prefer IPP as the nucleophilic 



substrate like FDS, and demonstrated that positive se- 
lection has played a role in the functional divergence 
of CDS in Anthemideae. 

Methods 

Amplification of FDS homologs from Anthemideae and its 
relatives 

Previous studies have well resolved the major clades of 
Asteraceae and their relationships, with the Anthemideae 
and Astereae tribes being most closely related [38-41] 
(Figure lA). In this study, four species representing four 
subtribes of Anthemideae {Pyrethrum coccineum, Leu- 
canthemum vulgare, Achillea asiatica, and Chrysanthe- 
mum lavandulifolium) were sampled. We also sampled 
one representative species from each of four tribes: Aster 
ageratoides (Astereae), Helianthus annuus (Heliantheae), 
Taraxacum mongolicum (Cichorieae), and Gerbera ana- 
ndria (Mutisieae). 

We first amplified FDS homologs from genomic DNA 
and then obtained the full-length cDNA using gene- 
specific primers based on the partial DNA fragments. Gen- 
omic DNA was extracted using a Plant Genomic DNA Kit 
(TianGen Biotech., Beijing, China). Two cycles of polymer- 
ase chain reaction (PCR) were conducted to amplify 
sequences corresponding to conserved regions II through 
V of FDS and CDS, In the first round of PCR, the primers 
CDSII (5'-CTTSTMCWTGATGACATRATGGA-30 and 
CDSVb (5'-TGCATTCTTCAATATCTGTTCCMGT-30 
were used to amplify CDS, while the primers FDSII 
(5/-CTKGTRCTYGATGAYATYATGGA-30 and FDSVb 
(5'-TKAARKCTTCWATRTCKGTYCCWAT-30 were used 
to amplify FDS, In the second round of PCR, the primers 
CDSII and CDSVa (5'-CRAAAGTGTCGAGATAATC 
ATT-30 were used to amplify CDS, and FDSII and FDSVa 
(5'-CAAAACARTCBAGATAATCRTCCT-30 were used to 
amplif)A FDS, Each amplification reaction (20 (il) contained 
Ix buffer, 0.5 (iM of each primer, 200 (iM of each dNTP, 
and 2.5 U of Taq polymerase, to which 1-1.5 (il of each 
genomic DNA template was added. The thermocycling 
program comprised an initial 5 min at 95°C, followed by 35 
cycles of 1 min at 95°C, 1 min at 52-58.5°C depending on 
the DNA template, and 1 min at 72°C, with a final exten- 
sion step of 5 min at 72°C. The amplification products 
were gel-purified and cloned into pGEM-T vectors 
(Promega Corp., Madison, USA). Twenty positive clones 
were screened using restriction enzyme fragment analysis. 
All distinct clones with the correct insertion were 
sequenced and the products were run on an ABI3730 
automatic sequencer. 

Total RNA from a mixture of leaves and shoots was 
extracted using a TRIzol kit (Tiangen Biotech Co., Ltd, 
Beijing, China), and 5' rapid amplification of cDNA ends 
(5' RACE) was performed using the 5' RACE system 
(Invitrogen, USA). After performing first-strand cDNA 
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Figure 1 Phylogenetic relationships of major clades of Asteraceae and piiylogeny of tiie FDS gene family. (A) Phylogenetic relationships 
of major clades of Asteraceae modified from the Figure 1 of Panero & Funk [41]. Bold branches indicate those most likely for the duplication 
event in the CDS family. (B) Phylogeny of the FDS gene family reconstructed by the ML method. Numbers next to branches are bootstrap 
percentages from Maximum Likelihood analysis, and posterior probabilities from Bayesian analysis. The accession numbers of each sequence are 
given in Additional file 2. The arrows indicate the duplication events. The asterisks indicate sequences that were not used in the codon-based 
analysis because of their incomplete coding regions. The colors indicate the tribes to which the species belong. 
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synthesis with gene-specific primers, the original mRNA 
was removed with RNase H and RNase Tl, and a polyC 
tail was added to the 5'-end. Then, two rounds of 
PGR amplification were performed with nested primers 
(Additional file 1). For 3' RACE, first-strand cDNA 
was produced using Superscript™ II Reverse Transcriptase 
(Invitrogen, USA). Then, one or two rounds of PGR ampli- 
fication were carried out with nested primers (Additional 
file 1). The amplification products (5'RACE and 3'RACE) 
were purified, cloned into pGEM-T vectors (Promega 
Corp., Madison, USA) and sequenced. The sequences were 
deposited in GenBank (accession numbers in Additional 
file 2). 

EST database search 

A survey of GenBank revealed that more than 1 million 
expressed sequence tag (EST) sequences were available 
for species from five tribes of Asteraceae [42]. Most of 
these sequences were contributed by the Compositae 
Genome Project. In the present study, libraries with 
more than 15,000 ESTs were used. For genera such as 
Helianthus, although EST libraries for several species 
are available, we used only one species to represent each 
genus. Hence, EST libraries for nine Asteraceae species 
representing five tribes were downloaded from GenBank. 
These species were Helianthus exilis (Heliantheae), 
Cichorium intybus, Lactuca saligna and Taraxacum offi- 
cinale (Cichorieae), Centaurea solstitialis, Carthamus 
tinctorius and Cynara scolymus (Cardueae), Gerbera 
hybrida (Mutisieae), and Barnadesia spinosa (Barnade- 
sieae). A Blast database for each EST library was con- 
structed using the Formatdb program implemented in 
stand-alone Blast 2.2.13 software [43]. Blast searches for 
sequences similar to FDS in each database were con- 
ducted using the TBlastN program with the FDSl pro- 
tein sequence of A, tridentata [2] as the query. 
Overlapping ESTs were assembled manually. Detailed 
information for contigs and singletons (represented by 
single reads) is listed in Additional file 2. 

Phylogenetic analysis 

Phylogenetic analyses were based on the coding sequences. 
The FDS2'\\kQ sequence of Lactuca saligna was excluded 
because of a large stretch of missing bases (240 bp) at 
the N-terminal. Sequences were aligned based on the 
translated amino-acid sequences using ClustalW in 
DAMBE [44]. 

Five additional FDS sequences from species of Aster- 
ids were retrieved from NCBI nonredundant sequence 
databases, with one sequence from a Gentianaceae spe- 
cies as the outgroup (for sequence information, see 
Additional file 2). Maximum likelihood (ML) analysis 
implemented in PHYML version 3.0 [45] and Bayesian 
inference (BI) implemented in MrBayes version 3.1.2 



[46] were used to construct phylogenetic trees. The 
best- fit evolutionary model, GTR + I + G, was selected 
with the Akaike information criterion using MODELT- 
EST 3.06 [47]. For the ML analysis, the starting tree 
was obtained with BioNJ, and parameter values were 
estimated from the data. Branch support was estimated 
from 1000 bootstrap replicates (BP). In the BI analysis, 
two independent Markov chain Monte Carlo runs were 
run simultaneously starting from a random tree for 10 
million generations, sampled every 1000 generations. The 
first 10% of samples were discarded as burn-in, and the 
remaining trees were used to construct the 50% majority- 
rule consensus tree. 

Selection test 

The codeml program in the PAML 4b package [48] was 
used to analyze possible positive selection acting on the 
FDS gene family. To reduce the impact of missing sites, 
our analyses were limited to FDS genes that contained 
the full-length coding region. First, branch models allowing 
the (0 ratio (co = d^/ds; where d^ is the non-synonymous 
substitution rate and ds is the synonymous substitution 
rate) to vary among lineages [49] were used to determine 
whether the selective pressure differed among different 
lineages. The one ratio model (MO) assumes the same co 
for all branches and all sites. The free ratio model (Mf) 
assumes an independent co parameter for each branch in 
the tree. In the phylogenetic analyses, three major clades, 
FDSl, FDS2, and CDS, were resolved. We assigned coi, (02, 
and (Oc to the lineages ancestral to the FDSl, FDS2, and 
CDS clades, respectively. The two ratio models (M2a-M2c) 
assumed one co ratio for branches of interest and the other 
ratio, coq, for all other branches; e.g., M2c assumed cOc for 
the branch ancestral to CDS and coq for all other branches 
(coi = 0)2 = (Oo). The three ratio models (M3a-M3c) 
assumed two branches of interest with different co ratios 
and all other branches had a ratio of coq. A more complex 
four ratio model (M4a) assumed four independent co 
ratios: one ratio each for the ancestral branches of FDSl 
(coi), FDS2 (CO2), and CDS (coj, and one for all other 
branches (coq). These models were compared using likeli- 
hood ratio tests (LRTs) of the log likelihood (InL) to check 
which model fit the data significantly better. 

Because the branch models average the co ratio over all 
sites and were unable to detect a positive signal in many 
cases, site-specific models allowing co to vary among 
sites [50,51] were subsequently used to determine 
whether particular amino-acid residues within FDS gene 
families have been subject to positive selection. In 
addition to the one ratio model (MO), five site models 
(Ml, M2, M3, M7, and M8) [50,51] were used. The 
nearly neutral model (Ml) assumes two classes of sites: 
conserved sites under strict constraint (0 < co < 1) and 
others under neutral selection (co = 1). The positive 
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selection model (M2) is an extension of Ml and assumes 
a third class of positively-selected sites (co > 1). The 
discrete model (M3) uses a general discrete distribution 
with three site classes. The beta model (M7) assumes a 
beta distribution for the co ratios over sites, while the 
beta&co model (M8) adds another site class to M7, 
allowing the co values to exceed 1. Three LRTs of nested 
models were applied: MO versus M3, Ml versus M2, and 
M7 versus M8. 

As the branch model showed that the co value for the 
branch ancestral to CDS was significantly different from 
that for the other branches, we further used branch-site 
model A to test for sites that were potentially under 
positive selection on the branch ancestral to the CDS 
subfamily [52]. The model assumes four classes of sites. 
The first two have coq (0 < coq < 1) and coi (coi = 1) along 
all lineages in the phylogeny, whereas the third and 
fourth have CO2 along the ancestral CDS branch, but coq 
and (Oi along other background branches. The branch- 
site model A was compared with the nearly neutral 
model (Ml). 

Structure modeling 

A homology model was constructed for CDS (113995) 
based on the crystal structure of human FDS in complex 
with zoledronate and isopentenyl diphosphate (Protein 
Data Bank Accession 2F8Z). The first 50 residues in the 
N-terminal of CDS were cut off because this portion is 
removed in the mature CDS protein [1]. The Align 2D 
structure alignment program (Insightll; Accelrys, San 
Diego, CA) was used to align the sequences, and the 
MODELER module of Insightll was used to automatic- 
ally generate models [53]. To select the best model, all 
optimized models were evaluated using the Profile-3D 
program [54]. Molecular graphics were created with 
PyMOL [55]. 

Results 

Characterization of the FDS gene family 

In total, we cloned 19 full-length cDNA sequences for the 
FDS gene family from 8 species of Asteraceae. For CDS 
in Leucanthemum vulgare and FDS2 in Aster ageratoides, 
we were unable to isolate a full-length cDNA despite 
great attempts using different cDNA templates, primers, 
PCR programs and annealing temperatures. Alignment of 
ORE sequences revealed a 2-base insertion in Helianthus 
annuus CDS, which was confirmed by repeated PCR 
amplification, cloning, and sequencing. This frameshift 
mutation leads to a premature stop codon, indicative of a 
nonfunctional pseudogene. 

The length of the cFDS sequences ranged from 1194 
to 1470 bp, with the ORE ranging from 1029 to 1035 bp, 
encoding proteins of 342 to 344 amino-acids. With the 



exception of the A. ageratoides cCDS, which included a 
large indel in the N-terminal, the cCDS sequences varied 
in length from 1330 to 1430 bp, with the ORE having a 
length of 1182 to 1197 bp, encoding proteins of 394 to 
399 amino-acids. Compared to FDS, CDS exhibited an 
~50-amino acid extension at its N-terminal. The exten- 
sion sequences of CDS from different species were highly 
variable, being rich in serine and threonine residues (aver- 
age 22.7% serine, 9.44% threonine) and showing a lack of 
acidic amino-acids. They were identified as potential 
chloroplast transit peptides by TargetP Version 1.1 [56]. 
These peptides shared little similarity with any other data- 
base sequence entries based on Blast searches. 

Sixteen EDS contigs and singletons were obtained 
from nine EST libraries, of which thirteen had a length 
of more than 540 bp. However, there were only three 
sequences with a full-length coding sequence: the EDSl 
genes from Cichorium intybus, Taraxacum officinale and 
Cynara scolymus, 

Phylogeny of the FDS gene family 

Two phylogenetic methods (ML and BI) generated almost 
the same tree topology and thus only ML tree is shown, 
with the internal node supports from two methods 
(Figure IB). The ML tree clearly showed that all FDS 
sequences from Asteraceae form a well-supported clade. 
FDS sequences from the species of other families fell out 
of the Asteraceae clade. In this clade, there were three 
clusters of genes, one consisting of FDSl homologs, one 
FDS2 homologs and another CDS homologs, indicative of 
two duplication events happened in the evolution of FDS 
gene family of Asteraceae. CDS clade was sister to that of 
FDS2, suggesting that the second duplication event led to 
the origin of CDS, All the Anthemideae species (marked 
by red colour in Figure IB) had FDSl, FDS2 and CDS 
genes, whereas only two types of FDS homologs were 
obtained from most species of other tribes. In Aster agera- 
toides of the tribe Astereae and Helianthus species of 
Heliantheae, both FDSl and CDS homologs were found. 
In species of Cichorieae and Cardueae, one type of FDS 
copy formed a cluster with the FDSl clade and another 
with the FDS2 clade. Interestingly, two FDS homologs 
were cloned from Gerbera anandria of Mutisieae, the 
basal tribe of Asteraceae. One of them fell in FDSl clade, 
while another clustered with the FDS2-\-CDS ancestral 
clade. These findings indicated that the second duplication 
event giving rise to FDS2 and CDS occurred after the 
divergence of G. anandria and before the origin of the 
tribe Anthemideae. Barnadesia spinosa, a species from 
the most basal tribe Barnadesieae (Figure lA), had one 
FDS homolog clustered with FDSl, suggesting that the 
first duplication event occurred in the common ancestor 
of Asteraceae. 
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(See figure on previous page.) 

Figure 2 Alignment of deduced amino-acid sequences and three-dimensional models of FDS and CDS. (A) Multiple sequence alignment 
of FDS and CDS. The amino-acid positions are numbered relative to CDS in P. cineroriifolium (113995). The consensus sequence highlights the five 
conserved domains identified in FDS. Sites under positive selection identified by PAML are indicated by asterisks (*sites with p>80%, **sites with 
p>95%, """""sites with p>99%). Amino-acids involved in the binding and catalysis of substrates [29-31,60] are marked by blue arrows. Red triangles 
show four sites involved in the binding and catalysis of substrates in FDS and mutated in CDS. Species name abbreviations: Aas, Achillea osiotico; 
Atr, Artemisia tridentata; Aag, Aster ageratoides; Can, Capsicum annuum; Cas, Centella asiatica; On, Cichorium intybus; Cla, Chrysanthemum 
lavandulifolium; Csc, Cynara scolymus; G\u,Gentiana lutea; Can, Gerbera anandria; Han, Helianthus annuus; Lvu, Leucanthemum vulgare; Hsa, Homo 
sapiens (NP_001 129294) (The human sequence is included for alignment since it was used as a template in the homology modeling); Mpi, 
Mentha x piperita; Pel, Pyrethrum cinerariaefolium; Pco Pyrethrum coccineum; Sly, Solanum lycopersicum; Tmo, Taraxacum mongolicum; Jofjaraxacum 
officinale. (B) Three-dimensional model of FDS (PDB: 2F8Z). R351 and F239 are involved in IPP-binding in FDS. The red dashed lines represent 
hydrogen bonds. The yellow sphere indicates water. (C) Distribution of positively-selected sites in the homology model of CDS. CDS has a 
structural model very similar to FDS: the arrangement of 10 core helices around a large central cavity. The sites indentified as positively-selected 
(p>80%) were clustered in the large central cavity (shown in pink). 



Structure modeling 

Previous studies demonstrated that the following sixteen 
FDS residues are involved in substrate-binding and ca- 
talysis: 56G, 57K, 60R, 96Q, 103D, 107D, 112R, 113R, 
174D, 200K, 201T, 239F, 240Q, 243D, 257K, and 351R 
(referred to 2F8Z) (Figure 2A, indicated by blue arrows) 
[29-31]. Among these sites of FDS, four were mutated in 
homologous sites of CDS: T201 in FDS G244 
(or S244) in CDS; F239 in FDS Y281 in CDS; D243 in 
FDS N285 in CDS; and R351 in FDS G393 in CDS 
(Figure 2 A, indicated by red triangles). Of these mutated 
sites, two (F239 and R351) involved in IPP substrate- 
binding in FDS are shown in Figure 2B. The structural 
feature of FDS is the arrangement of 10 core helices 
around a large central cavity, and the highly-conserved 
amino-acids are all located in this cavity [28,29]. These 
and other conserved FDS features are preserved in CDS, 
as shown in Figure 2C. 

Selection test 

We compared the log likelihood values from different 
branch models to explore whether the co ratios varied 
among different lineages and, particularly, whether the 
ratios for each ancestral branch to the FDSl, FDS2, and 
CDS subfamilies differed from those for other branches 
in the phylogeny. The results are shown in Table 1. The 
free ratio model (Mf) fit the data significantly better 
than the one ratio model (MO) (2AL = 236.66, P < 0.001), 
suggesting that the co ratios varied among lineages 
(ranging from 0 to 0.941). However, the co <1 under 
Mf indicated purifying selection in the gene family. 
The two ratio models M2a and M2b, which assigned 
one (0 to the ancestral lineages of FDSl or FDS2 and 
the other ratio coq to all other branches, produced log 
likelihood values very similar to the one ratio model 
and were not significantly better than the one ratio 
model (for M2a vs, MO, 2AL = 0.02, P >0.05; for M2b 
vs. MO, 2AL = 0.18, P >0.05). In contrast, the two 
ratio model M2c, with cOc = 0.951 for the lineage ances- 
tral to CDS and coq = 0.122 for all other branches, was 



significantly better than the one ratio model (2AL = 
34.22, P <0.001), indicating a significant increase in the co 
ratio for the ancestral branch for CDS. Finally, the three 
ratio models, M3a, M3b, and M3c, and the four-ratio 
model M4a were rejected in favor of M2c (Table 1). 

The LRTs for M2 vs. Ml (2AL = 0, P >0.05) suggested 
that the positive selection model (M2) was not significantly 
better than the nearly neutral model (Ml). Although mod- 
els M3 and M8 fit the data significantly better than the null 
models MO and M7 (for M3 vs. MO, 2AL = 327.54, 
P < 0.001; for M8 vs. M7, 2AL = 6.32, P <0.05), they 
did not identify sites with an co value significantly 
greater than 1. 

Given that the branch ancestral to CDS exhibited an 
increased co (cOc = 0.951) and CDS was endowed with a 
new function in the biosynthesis of terpenoids, branch- 
site model A was further used to test for evidence of 
positive selection on the branch ancestral to CDS 
(Table 1). LRTs showed that this model was significantly 
better than the nearly neutral model Ml. The parameter 
estimates under branch-site model A suggested that 
12.6% of codons along the CDS branch had been under 
positive selection, with co = 1.442. Bayes Empirical Bayes 
(BEB) analyses showed that at the P >50% level, branch- 
site model A identified 43 sites as being potentially 
subjected to positive selection on the CDS branch. At 
the P >80% level, the following 29 sites were identified: 
102M, 103V, 113Q, 166E, 191Q, 198H, 2111, 218T, 
220C, 225Q, 235L, 236N, 241Q, 2731, 279M, 281Y, 
285N, 290T, 293D, 295D, 300T, 306E, 307C, 3341, 
353K, 355A, 356Y, 386C and 394H (amino-acids refer 
to 113995) (Figure 2A & C). At the P >95% level, three 
sites, 103V, 218T and 290T, were identified. 

Discussion 

Gene duplication is considered to be a major mechan- 
ism in the generation of evolutionary novelty and 
adaptation [23,57,58]. In plants, gene duplication fol- 
lowed by functional divergence is particularly import- 
ant for the diversification of biochemical metabolites 
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Table 1 Parameter estimates under branch, site and branch-site models 



Model 


P 


InL 


Estimate of parameters 


Models compared 


2AL 


Branch models 


MO: CjOq = ooi = 0)2 = oOc 


1 


-10227.98 


OOo = 0O1 = 0O2 = OOc = 0.130 






Mf: 00 free 


51 


-10109.65 


00: 0 ~ 0.941 


Mf vs. MO 


236.66*** 


M2a: 000 = 002 = oOc, 001 


2 


-10227.97 


OOo = 002 = OOc =0.130, 001 = 0.150 


M2a vs. MO 


0.02 


M2b: oOq =00i = oOo 0O2 


2 


-10227.89 


= 001 = OOc = 0.129, 002 = 0.190 


M2b vs. MO 


0.18 


M2c: OOo = 0O1 = ^2, 


2 


-10210.87 


= 001 = 002 = 0.122, OOc = 0.951 


M2c vs. MO 


34.22*** 


M3a: oOo = oOo 0O1, 0O2 


3 


-10227.88 


OOo = OOc = 0.129, 001 = 0.149, 0O2 = 0.190 


M3a vs. M2a 


0.18 


M3b: OOo = ^2/ 


3 


-10210.85 


OOo = 002 = 0.122, 001 = 0.153, 002 = 0.949 


M3b vs. M2c 


0.04 


M3c: OOo = 001, ^2/ 


3 


-10212.06 


OOo = 001 = 0.121, 002 = 1.020, OOc = 0.756 


M3c vs. M2c 


-2.38 


M4a: oOo, 0O1, 0O2, oOc 


4 


-10210.82 


OOo = 0.122, 001 = 0.153, 002 = 0.148, oOc = 0.942 


M4a vs. M2c 


0.10 


Site-specific models 


Ml: nearly neutral 


1 


-10152.15 


Po = 0.929, Pi = 0.071 OOo =0.109, 0O1 = 1.000 






M2: PositiveSelection 


3 


-10152.15 


Po = 0.929, Pi = 0.071, P2 = 0.000 oOo = 0.109, 
001 = 1 .000, 002 =30.688 


M2 vs. Ml 


0 


M3: discrete (k=2) 


5 


-10064.21 


Po = 0.534, Pi = 0.417, P2 = 0.048 oOq = 0.028, 
001 = 0.223, 002 = 0.670 


M3 vs. MO 


327.54*** 


M7: beta 


2 


-10064.80 


p = 0.580 q = 3.405 






M8: beta & 00 


4 


-10061.64 


Po = 0.989, p = 0.646, q = 4.203 (pi = 0.01 1) 
00 = 1 .054 


M8 vs. M7 


6.32* 


Branch-site model 


model A 


3 


-10130.51 


Po = 0.353, Pi = 0.026, P2 = 0.621, 0O2 = 1.442 


MA vs.m 


43.28*** 



* significant at p <0.05 level, *** significant at p <0.001 level, 
p, the number of free parameters for the w ratio. 



[9,36,37,59]. Isoprenoids are a large and diverse class 
of metabolites [60] derived from five-carbon isoprene 
units, which can be classified into regular and irregular 
forms depending on the bond between isoprene units 
or monoterpenes, sesquiterpenes and diterpenes 
according to the number of isoprene units [1,16,61]. 
FDSs are involved in the biosynthesis of sesquiter- 
penes, and are encoded by a small gene family. It 
seems that this gene family has experienced lineage- 
specific gene expansions multiple times. For example, 
the two copies of Arabidopsis thaliana formed a 
species-specific clade (Additional file 3). FDS copies 
from Oryza sative and Sorghum bicolor formed a 
Poaceae-specific clade (Additional file 3). Based on 
phylogenetic analyses (Figure IB), we clearly showed 
that two rounds of gene duplication occurred in the 
evolutionary history of the Asteraceae FDS gene family. 
The first round of duplication appears to have occurred 
in the common ancestor of the Asteraceae, since the 
genes from Asteraceae formed a monophyletic group 
separated from the clusters for species from other fam- 
ilies (Figure IB and Additional file 3), and even from the 
species of two families closely related to Asteraceae 
(Nymphoides peltata of Menyanthaceae and Platycodon 
grandiflorus of Campanulaceae; The Compositae Gen- 
ome Project, personal communication). The FDS gene 



duplications in Asteraceae might contribute to the diver- 
sity of their sesquiterpenes because of the role of FDSs in 
the biosynthesis of sesquiterpenes. This is consistent with 
the large number of sesquiterpenes that have been 
extracted from Asteraceae [62,63]. 

The second duplication, which generated the lineage 
of CDS, occurred after the divergence of the Mutisieae 
from the other tribes of Asteraceae and before the diver- 
gence of the tribe Anthemideae. The evidence includes 
1) one FDS copy in G. anandria (Mutisieae) clustered 
with FDSl, while the other clustered with the ancestor 
of FDS2 and CDS; 2) all of the sampled Anthemideae 
species had three FDS copies {FDSl, FDS2, CDS); and 3) 
CDS was also cloned from Aster ageratoides (tribe Aster- 
eae) and Helianthus annuus (tribe Heliantheae) that are 
close relatives of the tribe Anthemideae [39,41]. After 
the origin of CDS, it developed a new function, involved 
in the biosynthetic pathway of irregular monoterpenes. 
The CDS gene is common in the tribe Anthemideae, 
which is consistent with the fact that its products are 
typically found in Anthemideae species. Our results sug- 
gested that the duplication and divergence of FDS genes 
has played a major role in determining the novelty of ir- 
regular monoterpenes in Anthemideae. 

After gene duplication, CDS accumulated amino-acid 
changes toward the change of a substrate. CDSs have 
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four substitutions in the substrate-binding and cataly- 
sis sites of FDS: T201 G244 (or S244), F239 
Y281, D243 N285, and R351 G393. Substitu- 
tions can be divided into either radical or conserva- 
tive, based on the biochemical properties of the 
amino-acids [64-67]. For example, substitutions asso- 
ciated with a change of polarity group are defined as rad- 
ical and those with the polarity group unaltered as 
conservative [65,67]. In the present study, except for 
T201 S244, all the substitutions are radical, which is 
consistent with the fact that the evolution of new function 
requires alterations in the biochemical properties of the 
amino-acid sequence [67]. F239 and R351 in FDS are 
involved in binding of the nucleophilic substrate IPP 
[30,31]. The radical replacements of these two sites in 
CDS are in good agreement with the finding that CDS 
does not prefer IPP as a nucleophilic reagent. F239 in FDS 
binds IPP through hydrophobic interactions [30,31]. The 
corresponding residue is Y281 in CDS, which, owing to 
the polarity of the hydroxyl, may not interact with IPP 
through hydrophobic interactions. R351 in FDS interacts 
with the pyrophosphate moiety of IPP through water- 
mediated hydrogen bonds [30,31] (Figure 2B). The radical 
replacement of R351 G393 involves changes in the 
charge (R is positively-charged and G is nonpolar) and the 
molecular volume of the amino-acids (R has a larger side- 
chain than G), which could affect the IPP binding. Hence, 
these two substitutions might explain why CDS does not 
prefer IPP as a substrate. 

Substitutions can change the function of duplicated 
genes, and may be due to either a relaxation of purifying 
selection or to the action of positive selection 
[19,23,32,33]. Branch-site model A provided evidence of 
positive selection acting at 29 (p >80) sites along the 
branch ancestral to CDS (Figure 2A). Interestingly, Y281 
(F239 in FDS) and N285 (D243 in FDS) noted above 
were found to be under positive selection by the branch- 
site model, suggesting the important role of positive 
selection in the functional evolution of CDS. The bio- 
chemical context of substitutions that were under posi- 
tive selection is consistent with a scenario involving the 
adaptive evolution of CDS. These sites (p>80%) were 
scattered throughout the primary sequences (Figure 2A), 
whereas in the three-dimensional structures (Figure 2C), 
they clustered in the large central cavity. Among these 
sites, two (102M and 103V) were located in conserved 
region I, and nine (279M, 281Y, 285N, 290T, 293D, 
295D, 300T, 306E and 307C) were located in conserved 
region V. They are all conserved in the FDS gene family 
and important for the precise function of the protein 
[27-31]. The mutations at these sites suggested that their 
importance for the enzymatic activity of FDS was altered 
in CDS. A few sites that were detected to have experi- 
enced positive selection with high probability may be 



responsible for the novel function of CDS. Further stud- 
ies using site-directed mutagenesis are needed to deter- 
mine whether these positively-selected sites, especially 
those with high posterior probability (103V, 218T and 
290T), confer an ability on CDS to discriminate different 
substrate types. 

It has been proposed that positive selection promotes 
the functional divergence of gene family members en- 
coding enzymes involved in secondary metabolism be- 
cause secondary products are considered to be a 
response to challenges imposed by the environment 
[36,37]. For example, the methylthioalkylmalate synthase 
gene {MAM) controls an early step in the biosynthesis 
of glucosinolates, which play an important role in Arabi- 
dopsis thaliana and other crucifers' defense against 
herbivorous insects [37]. Benderoth et al. [37] found that 
positive selection had driven the evolution of MAM2 
that originated from a lineage- specific duplication of 
MAMa in A. thaliana. Another example is the SABATH 
gene family of methyltransferases, which encodes 
enzymes catalyzing the formation of a variety of second- 
ary metabolites in plants such as those that contribute 
to floral scent and plant defense. Branch-site analysis 
suggested that positive selection for a single amino-acid 
change promoted the substrate discrimination of sali- 
cylic acid methyltransf erase [68]. Here, We provide an 
additional example in which positive selection has pro- 
moted the functional divergence of duplicated genes in a 
secondary metabolic pathway. The adaptive evolution of 
the CDS gene at the molecule level is consistent with 
the adaptive roles of the products of CDS (irregular 
monoterpenes), and plays an important role in plant 
survival. 

Many models have been proposed to explain the evolu- 
tionary fates of duplicated genes, including neofunctiona- 
lization, duplication-degeneration-complementation (or 
subfunctionalization) and escape from adaptive conflict 
(EAC) models [18-25]. Compared with CDS gene in 
which adaptive selection has been detected, FDS2 gene, 
as a sister duplicated copy of CDS^ did not show any sig- 
nature of positive selection. So, it seems that the evolu- 
tion of FDS2 and CDS are not consistent with the 
predictions of the EAC model, where both duplicated 
copies would evolve under positive selection [24,25]. 
Particularly, CDS has a ~ 50-amino acid extension at its 
N-terminal, which was identified as a plastidial transit 
peptide, in agreement with the Category II-c model 
(Gene duplication with a modified function) [24]. How- 
ever, the peptide of CDS shares little similarity with any 
other sequence database entries by Blast searches. Fur- 
ther work including functional analysis and the explor- 
ation on the origin of the peptide of CDS would provide 
insights into the evolutionary fate of the FDS gene family 
in Asteraceae. 
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Conclusions 

Based on phylogenetic analyses of FDS sequences, we 
demonstrated that two duplication events occurred in 
the evolution of the Asteraceae FDS gene family. The 
first occurred in the common ancestor of the Asteraceae 
and the second after the divergence of the Mutisioideae 
from the other tribes, but before the birth of the Anthe- 
mideae tribe. We found that CDS accumulated four 
mutations in sites homologous to the substrate-binding 
and catalysis sites of FDS: T201 G244 in conserved 
region IV, D243 N285 in the first aspartate in con- 
served region V, F239 Y281 in region V, and R351 
G393 in the C-terminal. Of the four replaced sites of 
FDS, F239 and R351 are involved in the binding of the 
nucleophilic substrate isopentenyl diphosphate. Likeli- 
hood analyses of a branch-site model provided evidence 
of positive selection acting on 29 sites (p >80) and 3 sites 
(p >95) on the branch ancestral to CDS, Positive selec- 
tion associated with gene duplication has played a major 
role in the evolution of CDS, 
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Additional file 1: Primers for RACE. 

Additional file 2: Sequences included in this study. 

Additional file 3: Maximum likelihood tree of FDS gene family from 
different plants. Numbers next to branches are bootstrap percentages 
from Maximum Likelihood analysis, and posterior probabilities from 
Bayesian analysis. 
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